446 research outputs found

    Die Rolle der Bioinformatik im Wirkstoffentwurf

    Get PDF

    Efficient Interpretation of Tandem Mass Tags in Top-Down Proteomics

    Get PDF
    Mass spectrometry is the major analytical tool for the identification and quantification of proteins in biological samples. In so-called top-down proteomics, separation and mass spectrometric analysis is performed at the level of intact proteins, without preparatory digestion steps. It has been shown that the tandem mass tag (TMT) labeling technology, which is often used for quantification based on digested proteins (bottom-up studies), can be applied in top-down proteomics as well. This, however, leads to a complex interpretation problem, where we need to annotate measured peaks with their respective generating protein, the number of charges, and the a priori unknown number of TMT-groups attached to this protein. In this work, we give an algorithm for the efficient enumeration of all valid annotations that fulfill available experimental constraints. Applying the algorithm to real-world data, we show that the annotation problem can indeed be efficiently solved. However, our experiments also demonstrate that reliable annotation in complex mixtures requires at least partial sequence information and high mass accuracy and resolution to go beyond the proof-of-concept stage

    High-accuracy peak picking of proteomics data

    Get PDF
    A new peak picking algorithm for the analysis of mass spectrometric (MS) data is presented. It is independent of the underlying machine or ionization method, and is able to resolve highly convoluted and asymmetric signals. The method uses the multiscale nature of spectrometric data by first detecting the mass peaks in the wavelet-transformed signal before a given asymmetric peak function is fitted to the raw data. In an optional third stage, the resulting fit can be further improved using techniques from nonlinear optimization. In contrast to currently established techniques (e.g. SNAP, Apex) our algorithm is able to separate overlapping peaks of multiply charged peptides in ESI-MS data of low resolution. Its improved accuracy with respect to peak positions makes it a valuable preprocessing method for MS-based identification and quantification experiments. The method has been validated on a number of different annotated test cases, where it compares favorably in both runtime and accuracy with currently established techniques. An implementation of the algorithm is freely available in our open source framework OpenMS (www.open-ms.de)

    Improved Flexibility and Scalability by Interpreting Story Diagrams

    Get PDF
    In this paper, we present an interpreter for Story Diagrams working on Eclipse Modeling Framework (EMF) models. The interpreter provides a more flexible and, under certain circumstances, a more scalable solution than the compiled Java code generated from Story Diagrams by Fujaba. of Dynamic EMF even allows the evolution of meta models at runtime. Story Diagrams can now be modeled and executed within Eclipse. They can be modified and re-executed by the Story Diagram interpreter immediately without recompiling the source code and restarting the application. Our implementation also supports higher-order transformations by using Story Diagrams to modify other Story Diagrams. generation is not applicable, like running systems. While interpretation obviously results in performance drawbacks, we demonstrate that the Story Diagram interpreter is able to improve the performance in certain worst-case situations compared to the average generated code. This is achieved by a dynamic ordering of the matching process, which considers the actual number of elements in an association at runtime. Such a dynamic ordering can minimize the matching effort considerably. In contrast, Fujaba generated code uses a static matching strategy. Whereas the Fujaba Story Diagrams have potentially high performance fluctuations, the performance of the Story Diagram interpreter is steadier and more scalable compared to the generated Java code

    Biomolecules in a structured solvent : a novel formulation of nonlocal electrostatics and its numerical solution

    Get PDF
    The accurate modeling of the dielectric properties of water is crucial for many applications in physics, computational chemistry, and molecular biology. In principle this becomes possible in the framework of nonlocal electrostatics, but since the complexity of the underlying equations seemed overwhelming, the approach was considered unfeasible for biomolecular purposes. In this work, we propose a novel formulation of nonlocal electrostatics which for the first time allows for numerical solutions for the nontrivial molecular geometries arising in the applications mentioned before. The approach is illustrated by its application to simple geometries, and its usefulness for the computation of solvation free energies is demonstrated for the case of monoatomic ions. In order to extend the applicability of nonlocal electrostatics to nontrivial systems like large biomolecules, a boundary element method for its numerical solution is developed and implemented. The resulting solver is then used to predict the free energies of solvation of polyatomic molecules with high accuracy. Finally, the nonlocal electrostatic potential of the protein trypsin is computed and interpreted qualitatively.Die präzise Modellierung der dielektrischen Eigenschaften des Wassers ist für viele Anwendungen in Physik, Computational Chemistry und Molekularbiologie von entscheidender Bedeutung. Theoretisch ist eine solche Modellierung im Rahmen der sogenannten nichtlokalen Elektrostatik möglich, doch da die dabei auftretenden Gleichungssysteme bislang als beinahe unlösbar schwierig galten, schien dieser Zugang für biomolekulare Problemstellungen ungeeignet. In dieser Arbeit präsentieren wir eine neuartige Formulierung der nichtlokalen Elektrostatik, die zum ersten Mal die Entwicklung numerischer Methoden erlaubt, die auf die nichttrivialen molekularen Geometrien, wie sie in den oben genannten Forschungsgebieten auftreten, anwendbar sind. Wir demonstrieren unseren Zugang zunächst durch die Anwendung auf einfache Modellgeometrien und zeigen seine Nützlichkeit für die Berechnung freier Solvatationsenergien einatomiger Ionen. Um die Anwendbarkeit der nichtlokalen Elektrostatik auf nichttriviale Systeme, wie z.B. große Biomoleküle zu erweitern, wird eine Randelementmethode zur numerischen Lösung der präsentierten Gleichungen entwickelt und implementiert. Der resultierende Randelementl öser wird daraufhin zur genauen Vorhersage der freien Solvatationsenergien kleiner Moleküle verwendet. Schließlich wird das nichtlokale elektrostatische Potential des Proteins Trypsin berechnet und qualitativ interpretiert

    A minimally invasive multiple marker approach allows highly efficient detection of meningioma tumors

    Get PDF
    BACKGROUND: The development of effective frameworks that permit an accurate diagnosis of tumors, especially in their early stages, remains a grand challenge in the field of bioinformatics. Our approach uses statistical learning techniques applied to multiple antigen tumor antigen markers utilizing the immune system as a very sensitive marker of molecular pathological processes. For validation purposes we choose the intracranial meningioma tumors as model system since they occur very frequently, are mostly benign, and are genetically stable. RESULTS: A total of 183 blood samples from 93 meningioma patients (WHO stages I-III) and 90 healthy controls were screened for seroreactivity with a set of 57 meningioma-associated antigens. We tested several established statistical learning methods on the resulting reactivity patterns using 10-fold cross validation. The best performance was achieved by Naïve Bayes Classifiers. With this classification method, our framework, called Minimally Invasive Multiple Marker (MIMM) approach, yielded a specificity of 96.2%, a sensitivity of 84.5%, and an accuracy of 90.3%, the respective area under the ROC curve was 0.957. Detailed analysis revealed that prediction performs particularly well on low-grade (WHO I) tumors, consistent with our goal of early stage tumor detection. For these tumors the best classification result with a specificity of 97.5%, a sensitivity of 91.3%, an accuracy of 95.6%, and an area under the ROC curve of 0.971 was achieved using a set of 12 antigen markers only. This antigen set was detected by a subset selection method based on Mutual Information. Remarkably, our study proves that the inclusion of non-specific antigens, detected not only in tumor but also in normal sera, increases the performance significantly, since non-specific antigens contribute additional diagnostic information. CONCLUSION: Our approach offers the possibility to screen members of risk groups as a matter of routine such that tumors hopefully can be diagnosed immediately after their genesis. The early detection will finally result in a higher cure- and lower morbidity-rate

    Glycosylation Patterns of Proteins Studied by Liquid Chromatography-Mass Spectrometry and Bioinformatic Tools

    Get PDF
    Due to their extensive structural heterogeneity, the elucidation of glycosylation patterns in glycoproteins such as the subunits of chorionic gonadotropin (CG), CG-alpha and CG-beta remains one of the most challenging problems in the proteomic analysis of posttranslational modifications. In consequence, glycosylation is usually studied after decomposition of the intact proteins to the proteolytic peptide level. However, by this approach all information about the combination of the different glycopeptides in the intact protein is lost. In this study we have, therefore, attempted to combine the results of glycan identification after tryptic digestion with molecular mass measurements on the intact glycoproteins. Despite the extremely high number of possible combinations of the glycans identified in the tryptic peptides by high-performance liquid chromatography-mass spectrometry (> 1000 for CG-alpha and > 10.000 for CG-beta), the mass spectra of intact CG-alpha and CG-beta revealed only a limited number of glycoforms present in CG preparations from pools of pregnancy urines. Peak annotations for CG-alpha were performed with the help of an algorithm that generates a database containing all possible modifications of the proteins (inclusive possible artificial modifications such as oxidation or truncation) and subsequent searches for combinations fitting the mass difference between the polypeptide backbone and the measured molecular masses. Fourteen different glycoforms of CG-alpha, including methionine-oxidized and N-terminally truncated forms, were readily identified. For CG-beta, however, the relatively high mass accuracy of ± 2 Da was still insufficient to unambiguously assign the possible combinations of posttranslational modifications. Finally, the mass spectrometric fingerprints of the intact molecules were shown to be very useful for the characterization of glycosylation patterns in different CG preparations

    NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction

    Get PDF
    BACKGROUND: NMR chemical shift prediction plays an important role in various applications in computational biology. Among others, structure determination, structure optimization, and the scoring of docking results can profit from efficient and accurate chemical shift estimation from a three-dimensional model. A variety of NMR chemical shift prediction approaches have been presented in the past, but nearly all of these rely on laborious manual data set preparation and the training itself is not automatized, making retraining the model, e.g., if new data is made available, or testing new models a time-consuming manual chore. RESULTS: In this work, we present the framework NightShift (NMR Shift Inference by General Hybrid Model Training), which enables automated data set generation as well as model training and evaluation of protein NMR chemical shift prediction. In addition to this main result – the NightShift framework itself – we describe the resulting, automatically generated, data set and, as a proof-of-concept, a random forest model called Spinster that was built using the pipeline. CONCLUSION: By demonstrating that the performance of the automatically generated predictors is at least en par with the state of the art, we conclude that automated data set and predictor generation is well-suited for the design of NMR chemical shift estimators. The framework can be downloaded from https://bitbucket.org/akdehof/nightshift. It requires the open source Biochemical Algorithms Library (BALL), and is available under the conditions of the GNU Lesser General Public License (LGPL). We additionally offer a browser-based user interface to our NightShift instance employing the Galaxy framework via https://ballaxy.bioinf.uni-sb.de/

    Neuropsychological Testing and Machine Learning Distinguish Alzheimer’s Disease from Other Causes for Cognitive Impairment

    Get PDF
    With promising results in recent treatment trials for Alzheimer’s disease (AD), it becomes increasingly important to distinguish AD at early stages from other causes for cognitive impairment. However, existing diagnostic methods are either invasive (lumbar punctures, PET) or inaccurate Magnetic Resonance Imaging (MRI). This study investigates the potential of neuropsychological testing (NPT) to specifically identify those patients with possible AD among a sample of 158 patients with Mild Cognitive Impairment (MCI) or dementia for various causes. Patients were divided into an early stage and a late stage group according to their Mini Mental State Examination (MMSE) score and labeled as AD or non-AD patients based on a post-mortem validated threshold of the ratio between total tau and beta amyloid in the cerebrospinal fluid (CSF; Total tau/Aβ(1–42) ratio, TB ratio). All patients completed the established Consortium to Establish a Registry for Alzheimer’s Disease—Neuropsychological Assessment Battery (CERAD-NAB) test battery and two additional newly-developed neuropsychological tests (recollection and verbal comprehension) that aimed at carving out specific Alzheimer-typical deficits. Based on these test results, an underlying AD (pathologically increased TB ratio) was predicted with a machine learning algorithm. To this end, the algorithm was trained in each case on all patients except the one to predict (leave-one-out validation). In the total group, 82% of the patients could be correctly identified as AD or non-AD. In the early group with small general cognitive impairment, classification accuracy was increased to 89%. NPT thus seems to be capable of discriminating between AD patients and patients with cognitive impairment due to other neurodegenerative or vascular causes with a high accuracy, and may be used for screening in clinical routine and drug studies, especially in the early course of this disease
    • …
    corecore